A09.TEX[106,PHY] - www.SailDart.org

perm filename A09.TEX[106,PHY] blob sn#848171 filedate 1987-11-04 generic text, type C, neo UTF8
COMMENT ⊗   VALID 00002 PAGES
C REC  PAGE   DESCRIPTION
C00001 00001
C00002 00002	\magnification\magstephalf
C00042 ENDMK
C⊗;
\magnification\magstephalf
\input macro.tex
\def\today{\ifcase\month\or
  January\or February\or March\or April\or May\or June\or
  July\or August\or September\or October\or November\or December\fi
  \space\number\day, \number\year}
\baselineskip 14pt
\rm
\line{\sevenrm a09.tex[106,phy] \today\hfill}


\def\boxbinary#1%
{\hbox to 1em{\hbox to 1em{\hfil#1\hfil}\hskip-1.05em minus1em
\vbox to 9.8222pt{\hrule height .5pt
\hbox to 1.2em{\vrule height 1.2em width .5pt\hfil\vrule height 
1.2em width .5pt}
\hrule height .5pt\vss}\hskip-.1emminus1em}}

\def\boxp{\mathbin{\boxbinary{$+$}}}
\def\boxm{\mathbin{\boxbinary{$-$}}}
\def\boxd{\mathbin{\boxbinary{$/$}}}
\def\boxa{\mathbin{\boxbinary{$\ast$}}}

\bigskip
\noindent
{\bf Numerical Precision.} (Must follow section on number representation.)

If I write $A:=B+C$ in a Pascal program, where $A$, $B$, and $C$ are of
type REAL, $A$~is not actually set to the sum of $B$ and~$C$; rather
it is set to a represented number close to the true sum, while the
true sum may not be one of the representable numbers. To draw attention
to the inexactness of computer arithmetic, in this section we will put
boxes around operators and functions which compute inexact answers;
$A:=B+C$ really means $A:=B\boxp C$.

Ordinarily, a computer's arithmetic unit is designed to give the true result
when it is representable. If not representable, the design may give a 
truncated result, which is the next smaller (in absolute value) representable
number, or a rounded result, which is the closest representable number.

Suppose, for illustration only, that our representable numbers are decimal
numbers with three significant digits precision, including the numbers
1.00, 1.01, 1.02, $\ldots$, 9.99, 10.0, 10.1, $\ldots$, 99.9, 100., 101.,
and so on. Here are some examples of the values $A:=B+C$ would give
to~$A$, on a machine that rounds and a machine that truncates.

$$\vcenter{\halign{$\lft{#}$\quad&
&$\ctr{#}$\quad
&$\ctr{#}$\quad
&$\ctr{#}$\quad
&$\ctr{#}$\cr
B&C&B+C&B\boxp C\hbox{ (rounded)}&B\boxp C\hbox{ (truncated)}\cr
\noalign{\smallskip}
1.23&4.56&5.79&5.79&5.79\cr
6.23&4.56&10.79&10.8&10.7\cr
1.23&.456&1.686&1.69&1.68\cr}}$$

\noindent
Actual computers work with binary numbers, and have higher precision
corresponding to six, eight, or more significant decimal digits, but the
qualitative effects of rounding error are similar.

Let $\epsilon↓M$ be the smallest value of ${v-u\over u}$ for any two
consecutive representable positive numbers~$u$ and~$v$. In our example
machine, $\epsilon↓M={1.01-1.00\over 1.00}={10.1-10.0\over 10.0}=0.01$.
In a well-designed computer, in a step of arithmetic where the true
result~$z$ lies between two consecutive representable positive numbers,
$0<x<z<y$, the computed result~$w$ is always~$x$ or~$y$.
$$\eqalign{{y-x\over x}&≤\epsilon↓M\cr
\noalign{\smallskip}
y&≤x(1+\epsilon↓M)<z(1+\epsilon↓M)\cr
\noalign{\smallskip}
x&≤{y\over 1+\epsilon↓M}>{z\over 1+\epsilon↓M}={z(1-\epsilon↓M)\over 
 (1-\epsilon↓M)↑2}≥z(1-\epsilon↓M)\cr
\noalign{\smallskip}
z(1-\epsilon↓M)&<w<z(1+\epsilon↓M)\cr}$$
and the computed result is always $z(1+e)$, where $e$, the {\it relative error},
is a small number with absolute value less than~$\epsilon↓M$. In a rounding
machine, the absolue value of~$\epsilon$ is less than $\epsilon↓M/2$. In a
truncating machine, $e$~is negative or zero.

%Let $\epsilon↓M$ be the largest value of ${y-x\over x}$, where $x$ and~$y$
%are two consecutive positive representable numbers.

%In this example $\epsilon↓M=1.01-1.00=0.01$. If a computed result lies
%between 1.00 and~10.0, on a truncating machine, it will be smaller
%than the true result, by not as much as~0.01:
%$$B+C-0.01<B\boxp C≤B+C\,.$$
%On a rounding machine, it will be within $\epsilon↓M/2$ of the true result:
%$$B+C-0.01/2<B\boxp C≤B+C+0.01/2\,.$$
%If the computed result lies between 10.0 and 100.0, we would find
%$$\eqalign{B+C-10\times 0.01<B\boxp C&≤B+C\hbox{ (truncated)}\cr
%B+C-10\times 0.01/2<B\boxp C&≤B+C+10\times 0.01/2\hbox{ (rounded)}\cr}$$
%and so on, for other ranges of values. We may sum this up in a formula
%showing the range of the ratio $(B\boxp C)/(B+C)$:
%$$\eqalign{1-\epsilon↓M<(B\boxp C)/(B+C)&≤1\quad\hbox{(truncated)}\cr
%1-\epsilon↓M/2<(B\boxp C)/(B+C)&≤1+\epsilon↓M/2\quad\hbox{(truncated)}\cr}$$

%The difference between the ratio $(B\boxp C)/(B+C)$ and 1.0 we call the
%{\sl relative error\/}. Sub\-tracting one from each term in the inequalities
%above,
%$$\eqalign{-\epsilon↓M<\hbox{ relative error }≤0\quad\hbox{(truncated)}\cr
%-\epsilon↓M/2<\hbox{ relative error }≤\epsilon↓M/2\quad\hbox{(rounded).}\cr}$$
%Letting $e$ stand for the relative error of the $\boxp$ operation, we find
%$$\eqalign{(B\boxp C)/(B+C)-1&=e\,,\quad\hbox{or}\cr
%B\boxp C&=(B+C)(1+e)\,,\quad\hbox{where}\cr
%&\qquad\qquad -\epsilon↓M≤e<0\quad\hbox{(truncated)}\cr
%&\qquad\qquad |e|≤\epsilon↓M/2\quad\hbox{(rounded)}\cr}$$
%From this characterization, we can analyze the errors in more complicated
%computations.

Suppose we write in Pascal
$$A:=B\ast C+D\ast E\,.$$
we get $A:=B\boxa C\boxp D\boxa E$, which is really
$$A:=\bigl((B\ast C)(1+e↓1)+(D\ast E)(1+e↓2)\bigr)(1+e↓3)\,.$$
We can expand this to
$$\eqalign{A:=&\bigl((B\ast C)(D\ast E)\bigr)+e↓1(B\ast C)+e↓2(D\ast E)
+e↓3(B\ast C+D\ast E)\cr
&\qquad +e↓1e↓3(B\ast C)+e↓2e↓3(B\ast C)\,.\cr}$$
Because each of $e↓1$, $e↓2$, $e↓3$ is very small compared to~1, we can
ignore as negligible products like~$e↓1e↓3$, so
$$A:=(B\ast C+D\ast E)+e↓1(B\ast C)+e↓2(D\ast E)+e↓3(B\ast C+D\ast E)\,.$$
On a rounding machine, assuming all variables are positive, the smallest
and largest possible values for $A$ are obtained by setting all the $e$'s
to $-\epsilon↓M/2$ and $+\epsilon↓M/2$ respectively; then $A$ can be
given any value in the range $(B\ast C+D\ast E)(1\pm\epsilon↓M)$.

Here is a program to estimate $\ln 2=\int↓1↑2dx/x$:

\smallskip
\halign{\qquad\qquad\lft{\tt #}\cr
S:=0\cr
FOR I:=N TO 2*N-1 DO\cr
\qq S:=S+1/I;\cr
WRITE (S)\cr}

\smallskip\noindent
(The program approximates $\int↓{I/N}↑{(I+1)/N}dx/x$ by the rectangular
approximation $dx=1/N$, $x=I/N$, $dx/x=1/I$).

The program actually gets, successively
$$\vcenter{\halign{$\rt{#}\;$&$\lft{#}\;$&$\lft{#}$\cr
S&=1\boxd N&=1/N(1+e↓1)\cr
\noalign{\medskip}
S&=\bigl(1\boxd N\boxp 1\boxd (N+1)\bigr)&=\bigl(1/N(1+e↓1)+1/(N+1)(1+e↓2)\bigr)
(1+e↓3)\cr
\noalign{\medskip}
S&=\bigl(1\boxd N\boxp 1\boxd (N+1)\bigr)+1\boxd (N+2)\cr
\noalign{\smallskip}
&\multispan2\hfil $=\bigl(\bigl(1/N(1+e↓1)+1/(N+1)(1+e↓2)\bigr)
(1+e↓3)+1/(N+2)(1+e↓4)\bigr)(1+e↓5)$\cr}}$$
and so on.

The final sum is, after discarding second order terms,
$$\eqalign{S=&\biggl(\,{1\over N}+{1\over N+1}+\cdots +{1\over 2N-1}\biggr)+\cr
\noalign{\smallskip}
&\quad {1\over N}(e↓1+e↓3+e↓5+\cdots)+\cr
\noalign{\smallskip}
&\quad {1\over N+1}(e↓2+e↓3+e↓5+\cdots)+\cr
\noalign{\smallskip}
&\quad {1\over N+2}(e↓4+e↓5+e↓7+\cdots)+\cdots +\cr
\noalign{\smallskip}
&\quad {1\over 2N-1}(e↓{2N-2}+e↓{2N-1})\cr}$$
Assume a truncating machine.
Since, on the average,the $e$'s are $-\epsilon↓M/2$, we can estimate the error as
$$\eqalign{&-{\epsilon↓M\over 2}\biggl(\,{N\over N}+{N-1\over N+1}+\cdots
+{1\over 2N-1}\,\biggr)=\cr
\noalign{\smallskip}
&-{\epsilon↓M\over 2}\biggl(\,{2N-N\over N}+{2N-(N+1)\over N+1}+\cdots
+{2N-(2N-1)\over 2N-1}\,\biggr)=\cr
\noalign{\smallskip}
&-{\epsilon↓M\over 2}\biggl(2N\biggl(\,{1\over N}+{1\over N+1}+\cdots
+{1\over 2N-1}\,\biggr)-N\biggr)\approx\cr
\noalign{\smallskip}
&-\epsilon↓MN(\ln 2-0.5)=-0.193\epsilon↓MN\cr}$$

The approximation of the integral by a finite sum itself introduces an
error, of about $-{1\over 2N}$. The combination of the two,
$-{1\over 2N}-0.193\epsilon↓MN$, can not be made very small; if $N$ is
chosen large enough to make the first term small, it makes the second
one large. The error is minimized when
${d\over dN}\bigl(-{1\over 2N}-0.193\epsilon↓MN\bigr)=0$, or
${1\over 2N↑2}-0.193\epsilon↓M=0$, $N=\sqrt{1/(2-0.193\epsilon↓M)}$;
on the DEC-20, where $\epsilon↓M=2↑{-26}$, the best~$N$ is about 13000,
and the error is about $-0.000076$.

This example is not presented as a good way to calculate $\ln 2$. It is
obviously both slow and inaccurate. It was chosen because it was simple
enough to analyze fully. It shows how a full-scale error analysis of a
computer program can be carried out using no more than first-year
calculus, and it shows the pitfalls of numerical imprecision. Without
the analysis, a programmer might perform the calculation with
$N=10↑6$ to make the sum a very accurate approximation to the integral;
the truncated result, however, would be off by about 0.0029, so only
the first two decimal digits would be correct. Setting $N$ to $10↑8$
would make the result entirely worthless.

A programmer who carries out numerical calculations where precision
is important should either be skilled in error analysis, or should
know how to recognize potentially dangerous computations so that a
professional numerical analyst can be consulted. Above all, never
assume just because the computer prints your results as eight-digit
numbers that they are accurate. Computers are just as happy printing
eight-digit tables of garbage as they are printing accurate answers
from correct programs.

For a deeper treatment of numerical precision, see Forsythe, Malcolm,
and Moler, {\sl Computer Methods for Mathematical Computations\/}.

Let's compare the errors made by rounding and truncating machines.
Take this program fragment:

\smallskip
\halign{\qquad\qquad\lft{\tt #}\cr
SUM := 0.0;\cr
FOR I:=1 TO N DO\cr
\qq BEGIN\cr
\qq READ(X);\cr
\qq SUM := SUM+X;\cr
\qq END;\cr
WRITE(SUM)\cr}

\smallskip\noindent
Assume the values of $X$ result from some reasonably accurate physical
measurement, and that they are all positive. The computation actually gets
$$\eqalign{&\bigl(\bigl(\bigl(\cdots\bigl((x↓1+x↓2)(1+e↓2)+x↓3\bigr)
(1+e↓3)+\cdots +x↓n\bigr)\bigr)\cdots\bigr)\bigr)(1+e↓n)\cr
&\quad\approx \sum↓{i=1}↑Nx↓i+e↓2(x↓1+x↓2)+e↓3(x↓1+x↓2+x↓3)+\cdots
+e↓n(x↓1+x↓2+\cdots +x↓n)\,.\cr}$$
If ${\bar x}$ is the average value of the $x↓i$, a truncating machine
gets $\sum↓{i=1}↑Nx↓i=N{\bar x}$ with an error at most
$-\epsilon↓M\bigl(Nx↓1+(N-1)x↓2+\cdots +1\cdot x↓N)$,
or about $-{N(N+1)\epsilon↓M\over 2}{\bar x}$; the relative error
is at most about $-{N+1\over 2}\epsilon↓M$.

If we execute the same program on a rounding machine, the worst case
either has each $e↓i=\epsilon↓M/2$, or $e↓i=-\epsilon↓M/2$; the relative
error of the result can be either positive or negative, and can be as
large as ${N+1\over 4}\epsilon↓M$ in absolute value. Usually, though,
the errors include about equal numbers of positive and negative
values. If you don't know anything about statistics, shut your eyes
while reading the next paragraph.

Each $e↓i$ is a random variable with mean 0, variance $\epsilon↓M↑2/12$.
The error in the sum therefore has mean~0 and variance
${\bar x}↑2{\epsilon↓M↑2\over 12}(1↑2+2↑2+3↑2+\cdots +N↑2)\approx
{\bar x}↑2\epsilon↓M↑2{N↑3\over 36}$, with a standard deviation
${\bar x}\epsilon↓MN↑{3/2}/6$. The standard deviation of the relative
error is then, dividing by $N{\bar x}$, about $\sqrt{N}\epsilon↓M/6$.

On a machine like the DEC-20, with $\epsilon↓M=2↑{-26}$, suppose $N=10↑6$. Then the
relative error using truncated arithmetic is $-10↑6\cdot 2↑{-26}/2=-0.0075$
at worst, and $-0.0037$ on the average; we can expect two-place accuracy
in results. Using rounded arithmetic the relative error is at worst
$\pm 0.0037$, but ``on the average'' $\pm 0.0000025$; that is, two thirds
of the time, the relative error will be no more than $2.5\times 10↑{-6}$
in absolute value, so the result is almost always correct to five places.

The analysis above is typical of the difference between rounding and
truncating computers. Truncating machines are faster, and easier subjects
of error analysis, but rounding is usually much more accurate, because
errors tend to nearly cancel. Many truncating computers offer an
option of two precisions, perhaps $\epsilon↓M\approx 10↑{-6}$
and $\epsilon↓M\approx 10↑{-14}$; if precision is important, use the
higher precision.

Another precision tradeoff: I have an algorithm to compute $f(x)$
with precision about~$\epsilon↓M$, but no direct method to calculate
its derivative, so I~compute an approximate derivative as
$${f(x+d)-f(x)\over d}\,.$$
How large should $d$ be? By Taylor's theorem,
$$f(x+d)\approx f(x)+df'(x)+{d↑2\over 2}f''(x)+\cdots\,,$$
so we get, roughly
$$\eqalign{&\bigl(\bigl(f(x)+df'(x)+d↑2/2\,f''(x)\bigr)
(1+e↓1)\boxm f(x)\bigr)\boxd d\approx\cr
&\bigl(\bigl(df'(x)+d↑2/2\,f''(x)+e↓1f(x)\bigr)(1+e↓2)/d\bigr)(1+e↓3)\approx\cr
&\quad f'(x)+d/2\,f''(x)+e↓1\,f(x)/d+(e↓2+e↓3)f'(x)\,.\cr}$$

The portion of the error dependent on $d$ is ${d\over 2}f''(x)+{e↓1\over d}f(x)$,
which is minimized when ${f''(x)\over 2}-{e↓1f(x)\over d↑2}=0$,
or $d↑2=2e↓1\,f(x)/f''(x)$. If $f(x)$ and $f''(x)$ are on the
order of~1 $\bigl($e.g., $f(x)=\sin(x)\bigr)$, $d$~should be about
$\sqrt{2e↓1}$, or $\sqrt{\epsilon↓M}$. On a typical computer,
we should set $d=10↑{-4}$ for best results, and the error would be
around~$10↑{-4}$. Once again, because of a tradeoff between two sources
of error, each large when the other is small, an eight-significant-digit
calculation only gives four-significant-digit results.

\smallskip\noindent
{\bf Exercise:} Would ${f(x+d)-f(x-d)\over 2d}$ be a better approximate
derivative? Work out the error analysis, and also compute some actual
errors where $f(x)=\sin(x)$, $e↑x$, and/or $\ln x$.

\smallskip
The above discussion of precision is not itself completely precise. Don't
assume that your computer always truncates or rounds perfectly unless you
have studied its circuitry. Occasional departures happen like this
(assume a 3-digit decimal machine):

\smallskip
\disleft 25pt:(1):
On a rounding machine, adding 123 to 0.567, it ignores the 0.567, giving
123 instead of~124.

\smallskip
\disleft 25pt:(2):
On a truncating machine, subtracting 0.0456 from 123, it ignores the
0.0456, giving 123 instead of~122.

\smallskip
\disleft 25pt:(3):
On either machine, subtracting 99.8 from 101, it discards the 0.8, giving
2.00 instead of~1.20.

\smallskip
Similarly, the assumption that products of relative errors are insignificant
has exceptions. I~can find examples where ignoring the products greatly
understates the error.

\vfill\eject

\line{\bf Overflow and Underflow\hfil}

\smallskip
\line{\it Yo Feet's Too Big\hfil}

The Pascal Standard tacitly acknowledges that present-day computers are
designed to work with numbers of limited size and precision.

Integers are limited in size to having absolute values no greater than the
symbolic constant {\tt MAXINT}, which is implementation-defined; that is,
it has different values on different computers. A~computation which at any
stage attempts to produce a larger integer value than {\tt MAXINT} is not
defined. In practice, that means it may give an undetected wrong answer, 
or that it may stop with or without explanation. Integer computations with all
intermediate and final results in the range {\tt $\pm$~MAXINT}, however,
are performed exactly.

Computation on real numbers is left almost entirely undefined by Standard
Pascal. In practice, real numbers must lie in certain ranges, and are 
computed with high enough precision for most scientific and engineering
computation. The details of precision are discussed later; typical range
restrictions are presented here.

Usually, numbers must fall in three ranges, as shown schematically below.

$$\vcenter{\offinterlineskip
\halign{\hfil#\hfil\qquad&\hfil#\xskip&\hfil#\hfil\xskip&#\hfil\qquad&\hfil#\hfil\cr
{\tt MAXNEGREAL}&{\tt MINNEGREAL}&&{\tt MINREAL}&{\tt MAXREAL}\cr
\noalign{\smallskip}
$\downarrow$&$\searrow$&&$\swarrow$&$\downarrow$\cr
$\bullet$&$\bullet$&$\bullet$&$\bullet$&$\bullet$\cr
\noalign{\smallskip}
&&0\cr}}$$

\noindent For discussion, we call the ends of the negative range
{\tt MAXNEGREAL} and {\tt MINNEGREAL}, and the ends of the positive range
{\tt MINREAL} and {\tt MAXREAL}. These names are not part of Standard Pascal.
The absolute values of {\tt MAXREAL} and {\tt MAXNEGREAL} are quite large,
perhaps around $10↑{38}$ or~$10↑{76}$; often they are equal. Those of
{\tt MINREAL} and {\tt MINNEGREAL} are quite small, but not zero, perhaps
around $10↑{-39}$ or $10↑{-77}$; again, equality is common. If a number
is greater than {\tt MAXREAL}, or less than {\tt MAXNEGREAL}, it is out
of the range Pascal can represent; an attempt to compute such a~number
is called {\it overflow\/}. An overflow may give rise to an undetected
wrong answer, or the program may stop with or without explanation. Similarly,
a~non-zero number between {\tt MINREAL} and {\tt MINNEGREAL} is out of
range; an attempt to compute such a number is called {\it underflow\/},
and may give rise to much the same unpleasantness as overflow.

Suppose {\tt MAXREAL} were $9.999\times 10↑9$ and {\tt MINREAL} were
$1.000\times 10↑{-9}$. Examples of overflow would be
$5000000000.0+6.000000000.0$, $3.00000.0\times 40000.0$, and
$1234567890/0.0002$; examples of underflow would be
$1/50000-1/49999$, $(1/50000)\ast(1/49999)$, and $(1/50000)/49999$.

Some common values of {\tt MAXINT}, {\tt MAXREAL}, and {\tt MINREAL} are
shown below:

$$\vcenter{\halign{#\hfil\qquad&\hfil#\hfil\qquad&\hfil#\hfil\qquad%
&\hfil#\hfil\qquad&\hfil#\hfil\cr
Computer&Word length&{\tt MAXINT}&{\tt MAXREAL}&{\tt MINREAL}\cr
\noalign{\smallskip}
DEC-20&36&34,359,738,367&$1.7\ast 10↑{38}$&$1.5\ast 10↑{-39}$\cr
VAX&16&\phantom{3}2,147,483,647&$1.7\ast 10↑{38}$&$2.9\ast 10↑{-39}$\cr
IBM 360, 370, 30xx, 43xx&32&\phantom{3}2,147,483,647&$7.2\ast 
10↑{75}$&$5.4\ast 10↑{-79}$\cr
IBM PC&16&\phantom{34,359,7}32,767\cr}}$$

\noindent
While overflow and underflow are not common, the wise programmer will take
precautions against them, remembering that a program that always works
on one computer may overflow on another.

As an example of how an underflow might arise in an otherwise reasonable
program, consider computing an approximate value of~$e↑x$ by the
polynomial $1+x+x↑2/2+x↑3/6+\cdots +x↑{10}/10!$; on some computers, the
last term will underflow when $|x|$ is around 0.0001.

\bigskip
{\baselineskip0pt
\line{\bf\phantom{Wh}Good\hfill}
\line{\bf What $\,$ Are the Standard Functions of Pascal?\hfill}
\line{\bf\phantom{What}$\,↑{∧}$\hfill}}

Every standard operator and function of a programming language, ideally, is
included for the same reason that I~include a tool in my toolbox:
it's so useful that it's worth the effort of carrying it around.  I don't agree
with all the choices the designers of Pascal made, but I'll try to show why
they felt a certain set of functions was useful.

\bigskip\noindent
{\bf Standard Functions.}

{\tt SQR$(x)$} is the square of~$x$.  
Pascal does not have a built in operation to raise
a number to an arbitrary power, because the best choice of algorithm depends on
so many considerations, only the programmer should decide it.  {\tt SQR}$(x)$ 
is the most common case, and many others like 
$x↑8= {\tt SQR}\bigl({\tt SQR}({\tt SQR}(x))\bigr)$, 
can be built up from it. The
argument may be {\tt REAL} or {\tt INTEGER}; 
the result has the same type.  The only pitfall
is overflow   if  $x$~is an integer larger than
$\sqrt{\tt MAXINT}$, or a real larger than $\sqrt{\tt MAXREAL}$.

{\tt SQRT}$(x)$ is the positive square root of~$x$, where $x$~may 
be {\tt REAL} or {\tt INTEGER}; the
result is {\tt REAL}.  
The pitfall is that the function is lethally undefined if
$x<0$.  In some situations, exact calculation of~$x$ would make~$x$ positive,
but rounding errors make it negative.  
This pitfall may occur in \naive\ statistical
calculations of standard deviations, especially 
where the mean is much larger than the standard deviation.

{\tt ABS}$(x)$ is the absolute value of~$x$, 
where $x$~may be {\tt REAL} or {\tt INTEGER}; the result is of
the same type.  
It is used frequently in programming to test whether the difference
between two numbers is small, as in

{\obeylines\obeyspaces\let =\ \tt
        IF ABS($x$-$y$)<0.000001 THEN ...
}

\smallskip\noindent
The only pitfall: some systems have a negative integer equal to
{\tt -(MAXINT+1)}; if $x$~has that value, {\tt ABS}$(x)$ 
overflows.

{\tt SIN}$(x)$, {\tt COS}$(x)$ 
are the standard trigonometric functions sine and cosine, where~$x$ 
({\tt REAL} or {\tt INTEGER}) 
is measured in radians; the result is {\tt REAL}.  If $x$~is
measured in degrees, use 
$\sin(\pi x/180)$, where $\pi =3.141592654$.  Other direct
trigonometric functions such as $\tan(x)=\sin(x)/\cos(x)$
are usually computed from the sine and cosine.
 The major pitfall of {\tt SIN} and {\tt COS} is that if
$x$~is much larger than~1, {\sl  relatively\/} small rounding errors 
in~$x$ give rise to errors in {\tt SIN}$(x)$ and {\tt COS}$(x)$
that are {\sl relatively\/} large. For example, if $x=10↑6$, a~relatively
small error of $0.01=10↑{-8}x$ in~$x$ gives rise to
a relatively large error, possibly near 0.01, in {\tt SIN}$(x)$, which
is no greater than one.

{\tt ARCTAN}$(x)$is the angle between $-\pi/2$ and $\pi/2$ whose tangent 
is~$x$; $x$~is {\tt REAL} or
{\tt INTEGER}, and the {\tt REAL} 
result is measured in radians.  For a result in degrees,
use $180\arctan(x)/\pi$.  The only serious pitfall arises when computing 
{\tt ARCTAN}$(x/y)$; if~$y$ 
becomes zero, division overflow occurs, even though the desired angle
is well defined.  To avoid this, use (say) 
$\pi/4-${\tt ARCTAN}$(x/y)$ when $|x/y|>1$.

{\tt EXP}$(x)$ is the exponential function $e↑x$, where $e$ is 2.718281828; 
$x$~is {\tt REAL} or
{\tt INTEGER}, and the result is {\tt REAL}.  
Overflow occurs unless $x$~is in the range
${\tt LN(MINREAL)} ≤x≤ {\tt LN(MAXREAL)}$, typically limiting~$x$ to 
about~$\pm 88$.
Another minor pitfall is that if $x$~is in the outer
portion of its range, {\sl relatively\/} small rounding errors in~$x$ result in 
{\sl relatively\/} larger errors in {\tt EXP}$(x)$, by a factor of nearly~200.

{\tt LN}$(x)$ is the natural (base~$e$) logarithm of~$x$; $x$~is 
{\tt REAL} or {\tt INTEGER}, with a
{\tt REAL} result.  For base 10 logarithms, use $\ln(x)/\ln(10)$ 
or the non-standard
function {\tt LOG}$(x)$.  A~pitfall is the lethal error $x≤0$.  
Less obvious is that if
$x$~is close to~1, a~{\sl relatively\/} small error in~$x$ 
can result in {\sl relatively\/}
enormous errors in {\tt LN}$(x)$; 
for example, changing $x$ from 1.0001 to 1.0002 doubles {\tt LN}$(x)$.

{\tt TRUNC}$(x)$ is the integer part of a real number~$x$.  
{\tt TRUNC(2.0)=TRUNC(2.99)=2}; symmetrically,
{\tt TRUNC(-2.0)=TRUNC(-2.99)=-2}. 
{\tt TRUNC} is useful in going from ``how much'' to
``how many'', as in determining how many standard sized shelves can be cut
from a board, {\sl but\/} it is seldom right to use {\tt TRUNC}
if $x$~is negative.  If $x$~is
positive, {\tt TRUNC}$(x)$ is the largest integer that doesn't 
exceed~$x$, but if $x$~is
negative {\tt  TRUNC}$(x)$ 
is larger than~$x$, except when $x$~has an exact integer value.
To classify incomes into \$1000 ranges for tabulation, one might think of using
{\tt TRUNC(INCOME/1000)}, but that formula classifies the much larger
range from $-\$999$ to \$999 as zero.
The non-standard function
${\tt FLOOR}(x)\;\;\bigl($or {\tt ENTIER}$(x)\bigr)$, 
the largest integer not
exceeding~$x$, it is probably more useful.  Another pitfall is overflow 
if $|x|>{\tt MAXINT}$.
To find how many times {\tt B} goes into {\tt A}, 
both being positive real numbers, use {\tt TRUNC(A/B)}.

{\tt ROUND}$(x)$ is the integer closest to~$x$.  If the
fractional part of~$x$ is~1/2, it is
rounded to a larger absolute value.  That is, 
{\tt ROUND(2.5)=ROUND(3.499)=3};
{\tt ROUND(-2.5)=ROUND(-3.499)=-3}.  
The major pitfall of {\tt ROUND} is overflow if
$|x|>{\tt MAXINT}$.

{\tt ORD}$(x)$, where $x$~belongs to any ordinal type ({\tt CHAR} 
and enumerated types
especially), is the ({\tt INTEGER}) ordinal number by which~$x$ is enumerated.  The
major pitfall is that {\tt ORD} is implementa\-tion-dependent when $x$~is a
character.
If $x$~is a digit character, however, 
${\tt ORD}(x)-{\tt ORD}('0')$ is its numerical value.
Another pitfall is that except for {\tt INTEGER}, the ordinal numbers
of all ordinal types start with zero.


%\bigskip
%\parindent0pt
%\copyright 1984 Robert W. Floyd

%First draft March 28, 1984



\bigskip
\parindent0pt
\copyright 1984 Robert W. Floyd

First draft July 5, 1984 

\bye